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This research is dedicated to Erich L. Lehmann, the thesis advisor of one 
of us and "grand thesis advisor" of the others. It is a work in which we try 

to develop nonparametric methods for doing inference in a setting, 
unlabeled networks, that he never considered. However, his influence shows 
in our attempt to formulate and develop a nonparametric model in this 
context. We also intend to study to what extent a potentially "optimal" 

method such as maximum likelihood can be analyzed and used in this 
context. In this respect, this is the first step on a road he always felt was 

the main one to stick to. 

Probability models on graphs are becoming increasingly impor- 
tant in many applications, but statistical tools for fitting such models 
are not yet well developed. Here we propose a general method of mo- 
ments approach that can be used to fit a large class of probability 
models through empirical counts of certain patterns in a graph. We 
establish some general asymptotic properties of empirical graph mo- 
ments and prove consistency of the estimates as the graph size grows 
for all ranges of the average degree including r2(l). Additional results 
are obtained for the important special case of degree distributions. 

1. Introduction. The analysis of network data has become an important 
component of doing research in many fields; examples include social and 
friendship networks, food webs, protein interaction and regulatory networks 
in genomics, the World Wide web and computer networks. On the algo- 
rithmic side, many algorithms for identifying important network structures 
such as communities have been proposed, mainly by computer scientists and 
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physicists; on the mathematical side, various probabihty models for random 
graphs have been studied. However, there has only been a limited amount of 
research on statistical inference for networks, and on learning the network 
features by fitting models to data; to a large extent, this is due to the gap 
between the relatively simple models that are analytically tractable and the 
complex features of real networks not easily reproduced by these models. 

Probability models on infinite graphs have a nice general representation 
based on results [Aldous (1981), Hoover (1979), Kallenberg (2005), Diaconis 
and Janson (2008)], analogous to de Finetti's theorem, for exchangeable 
matrices. Here, we give a brief summary closely following the notation of 
Bickel and Chen (2009). Graphs can be represented through their adjacency 
matrix A, where Aij = 1 if there is an edge from node i to j and otherwise. 
We assume An = 0, that is, there are no self- loops. Aij's can also represent 
edge weights if the graph is weighted, and for undirected graphs, which is 
our focus here, Aij = Aji. For an unlabeled random graph, it is natural to 
require its probability distribution P on the set of all matrices {[Aij],i,j > 1} 
to satisfy [ACTjo-j] ~ P, where a is an arbitrary permutation of node indices. 
In that case, using the characterizations above one can write 



where a, and Xij are i.i.d. random variables distributed uniformly on (0, 1), 
Xij = Xji and 5 is a function symmetric in its second and third arguments, a 
as in de Finetti's theorem corresponds to the mixing distribution and is not 
identifiable. The equivalent of the i.i.d. sequences in de Finetti's theorem 
here are distributions of the form Aij = g{£,i,S,j, Xij)- This representation is 
not unique, and g is not identifiable. These distributions can be parametrized 
through the function 



The function h is still not unique, but it can be shown that if two functions 
hi and h2 define the same distribution P, they can be related through a 
measure-preserving transformation, and a unique canonical h can be defined, 
with the property that hca,n{u,v) dv is monotone nondecreasing in u; see 
Bickel and Chen (2009) for details. From now on, h will refer to the canonical 
hcan- We use the following parametrization of h: let 



Jo Jo 

be the probability of an edge in the network. Then the density of (Ci,Cj) 
conditional on Aij = 1 is given by 



(1.1) 




(1.2) 




(1.3) 




(1.4) 
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With this parametrization, it is natural to let p = pn, make w independent of 
n and control the rate of the expected degree A„ = {n — l)pn as n — )• oo. The 
case most studied in probability on random graphs is A„ = [where a„ = 
means a„ = 0{hn) and 6„ = 0(an)]- The case of A„ = 1 corresponds to 
the so-called phase transition, with the giant connected component emerging 
for A„ > 1. 

Many previously studied probability models for networks fall into this 
class. It includes the block model [Holland, Laskey and Leinhardt (1983), 
Snijders and Nowicki (1997), Nowicki and Snijders (2001)], the configuration 
model [Chung and Lu (2002)] and many latent variable models, including 
the univariate [Hoff, Raftery and Handcock (2002)] and multivariate [Hand- 
cock, Raftery and Tantrum (2007)] latent variable models, and latent feature 
models [Hoff (2007)]. In fact, dynamically defined models such as the "pref- 
erential attachment" model [which seems to have been first mentioned by 
Yule in the 1920s, formally described by de Solla Price (1965) and given its 
modern name by Barabasi and Albert (1999)] can also be thought of in this 
way if the dynamical construction process continues forever producing an 
infinite graph; see Section 16 of Bollobas, Janson and Riordan (2007). 

Bickel and Chen (2009) pointed out that the block model provides a nat- 
ural parametric approximation to the nonparametric model (1.2), and the 
block model is the main parametric model we consider in this paper; see 
more details in Section 3. The block model can be defined as follows: each 
node i = 1, . . . ,n is assigned to one of K blocks independently of the other 
nodes, with P(cj = a) = vTa, 1 <a< K , X^^iTTq = 1, where K is known, 
and c = (ci, . . . , c„) is the n x 1 vector of labels representing node assign- 
ments to blocks. Then, conditional on c, edges are generated independently 
with probabilities P[^ij = l|ci = a,Cj = h] = Fab- The vector of probabili- 
ties TT = {vTi, . . . ,itk} and the K x K symmetric matrix F = [Fab]i<a,b<K 
together specify a block model. The block model is typically fitted either 
in the Bayesian framework through some type of Gibbs sampling [Snijders 
and Nowicki (1997)] or by maximizing the profile likelihood using a stochas- 
tic search over the node labels [Bickel and Chen (2009)]. Bickel and Chen 
(2009) also established conditions on modularity-type criteria such as the 
Newman-Girvan modularity [see Newman (2006) and references therein] 
give consistent estimates of the node labels in the block model, under the 
condition of the graph degree growing faster than logn, where n is the num- 
ber of nodes. They showed that the profile likelihood criterion satisfies these 
conditions. 

The block model is very attractive from the analytical point of view and 
useful in a number of applications, but the class (1.2) is much richer than the 
block model itself. Moreover, the block model cannot deal with nonuniform 
edge distributions within blocks, such as the commonly encountered "hubs," 
although a modification of the block model introducing extra node-specific 
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parameters has been recently proposed by Karrer and Newman (2011) to 
address this shortcoming. It may also be difficult to obtain accurate results 
from fitting the block model by maximum likelihood when the graph is 
sparse. 

In this paper, we develop an alternative approach to fitting models of type 
(1.2), via the classical tool of the method of moments. By moments, we mean 
empirical or theoretical frequencies of occurrences of particular patterns in 
a graph, such as commonly used triangles and stars, although the theory is 
for general patterns. While specific parametric models like the block model 
can be fitted by other methods, the method of moments applies much more 
generally, and leads to some general theoretical results on graph moments 
along the way. We note that related work on the method of moments was 
carried out for some specific parametric models in Picard et al. (2008). 

A well-studied class of random graph models where moments play a big 
role is the exponential random graph models (ERGMs). ERGMs are an ex- 
ponential family of probability distributions on graphs of fixed size that use 
network moments such as number of edges, p-stars and triangles as sufficient 
statistics. ERGMs were first proposed by Holland and Leinhardt (1981) and 
Prank and Strauss (1986) and have then been generalized in various ways 
by including nodal covariates or forcing particular constraints on the pa- 
rameter space; see Robins et al. (2007) and references therein. While the 
ERGMs are relatively tractable, fitting them is difficult since the partition 
function can be notoriously hard to estimate. Moreover, they often fail to 
provide a good fit to data. Recent research has shown that a wide range of 
ERGMs are asymptotically either too simplistic, that is, they become equiv- 
alent to Erdos-Renyi graphs, or nearly degenerate, that is, have no edges or 
are complete; see Handcock (2003) for empirical studies and Chatterjee and 
Diaconis (2011) and Shalizi and Rinaldo (2011) for theoretical analysis. 

The rest of the paper is organized as follows. In Section 2, we set up 
the notation and problem formulation and study the distribution of empir- 
ical moments, proving a central limit theorem for acyclic patterns. We also 
work out examples for several specific patterns. In Section 3 we show how 
to use the method of moments to fit the block model, as well as identify a 
general nonparametric model of type (1.2). In Section 4, we focus on degree 
distributions, which characterize (asymptotically) the model (1.2). Section 5 
discusses the relationship between normalized degrees and more complicated 
pattern counts that can be used to simplify computation of empirical mo- 
ments. Section 6 concludes with a discussion. Proofs and additional lemmas 
are given in the Appendix. 
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2. The asymptotic distribution of moments. 

2.1. Notation and theory. We start by setting up notation. Let G„ be a 
random graph on vertices 1, . . . , n, generated by 

(2.1) F{Aij = =u,Cj = v) = hn{u,v) = pnw{u,v)I{w < p;^^), 

where w{u,v) > 0, symmetric, < n,f < 1, /)„—)• 0. We cannot, unfortu- 
nately, treat pn and w as two completely free parameters, as we need to 
ensure that h<l. We can either assume that the sequence pn is such that 
PnW < 1 for all n, or restrict our attention to classes where 'Wn{u,v) = 

w{u,v)I{w{u,v) < p~^) ^ w{u,v). In either case, we can ignore the weak 
dependence of Wn on pn and effectively replace Wn with w. 
Let T : £2(0, 1) — )■ ^2(0, 1) be the operator defined by 

[Tf]{u)^ [\{u,v)f{v)dv. 







We drop the subscript n on h,T when convenient. Similarly, let : £2(0, 1) 
£2(0, 1) be defined by w. Let 



1 „ 2L 



n — ' n 

■ 1 



Thus Di is the degree of node i, D is the average degree and L is the total 
number of edges in G„ . 

Let -R be a subset of {{i, j) : 1 < i < j < n} . We identify R with the vertex 
set V{R) = {i : (i, j) or {j, i) £ R for some j} and the edge set E{R) = R. Let 
Gn{R) be the subgraph of G„ induced by V{R). Recall that two graphs Ri 
and i?2 are called isomorphic ~ R2) if there exists a one-to-one map a 
of V{Ri) to V{R2) such that the map — )■ {ai,aj) is one-to-one from 
^(/^i) to E{R2). 

Throughout the paper, we will be using two key quantities defined next: 

Q{R)=F{Aij = l, all {i,j)£R), 

P{R)=F{E{Gn{R)) = R)- 

Next, we give a proposition summarizing some simple relationships between 
P and Q. The proof, which is elementary, is given in the Appendix. Similar 
results are implicit in Diaconis and Janson (2008). 

Proposition 1. IfGn is a random graph, and R a subset of {{i,j) : 1 < 
i < j ^n}, then 



p(i2)=E| n hic,,^,) n (i-/j(e„^,))| 
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(2.2) = Q{R) - ^{Q(i2U (i, j)) : (ij) G R} 

+ Y,{QiR^{ihj), (kj)}) : {ij), (A:,/) G i?} - • • • , 
where R = {{i,j) ^R,ie V{R),j G V[R)}. Further, 

(2.3) Q{R) = 5^{P(5) : 5 D i?, V{S) = V{R)}. 
Here Rc S refers to S C {(ij) : i, j G V{R)}. 

The quantities P{R) and Q{R) are unknown population quantities which 
we can estimate from data, that is, from the graph G„. Define, for R C 
{{i,j):l<i<j<n} with \V{R)\=p, 

^^^^ = f)W{R) ^^^^^ -R)--GC Gn}, 

where N(R) is the number of graphs isomorphic to R on vertices 1, . . . ,p. For 
instance, if i? is a 2-star consisting of two edges (1, 2), (1, 3), then N{R) = 3. 
Further, let 

Q{R) = Y,{HS) --SDR, ViS) = ViR)}. 

Here we use R and S to denote both a subset and a subgraph. Evidently, 

EP{R) = P{R), EQ{R) = Q{R). 

The scaling here is controlled by the parameter pn , the natural assumption 
for which is — )• 0. In that case, P{R) — ?• for any fixed R with a fixed 
number of vertices p. Therefore we consider the following rescaling of P{R) 
and Q{R): writing \R\ for \E{R)\, let 

P{R) = P~\^\P{R), Q{R)=p-\^\Q{R). 

Then we have 

(2.4) P{R)=E n Wn{^^.^,)+0(^\ 

since 

p-l^lE H hniC^,^J)\ n {'^-hn{C^,^,))-l]=0{pn) = 0(^\ 

if f w'^'^^^^~^^^{u,v)dudv <oo. 

Next, we define the natural sample estimates of the population quantities 
P and Q by 

P{R) = p-\''^P{R), Q{R) = p-J^'^QiR), 

where pn = = n(^i-) is the estimated probability of an edge. For these 
rescaled versions of P and Q, we have the following theorem. 
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Theorem 1. Suppose Jq w'^{u,v)dvdu<oo. 

(a) //An— J-oo, then 

(2.5) ^^pl, 

Pn 

(2.6) ^(^^-1^ ^ AA(0,a2) 

/or some cr^ > 0. Suppose further R is fixed, acyclic with \V{R)\ =p and 
J w'^^^^{u,v) dudv < oo. Then, 

P{R) ^pP{R), 

(2.7) 

y/7i{P{R) - P{R)) =^ AA(0,(j2(i?)). 
More generally, for any fixed {Ri, . . . , Rk} as above with \ V{Rj)\ < p, 
(2.8) V^{{P{Ri), . . .,P{Rk)) - {P{Ri), . ■ .,P{Rk))) ^ AA(0,S(R)). 

(b) Suppose A„ — 7- A < oo. Conclusions (2.5)-(2.8) continue to hold save 
that a'^{R), S(i?) depend on A as well as R. 

(c) Even if R is not necessarily acyclic, the same conclusions apply to Q 
and Q if A„ is of order n}~'^l'P or higher, and to P and P under the same 
condition on A„. 

The proof is given in the Appendix. 

Remarks. (1) Note that part (b) yields consistency and asymptotic nor- 
mality of acyclic graph moment estimates across the phase transition to a 
giant component, that is, for A < 1 as well as A > 1. 

(2) Note that we are, throughout, estimating features of the canonical w. 
Unnormalized P and Q are trivially if A„ is not of order n. 

(3) In view of (2.4), we can use P{R) as an estimate of Q{R) if R is acyclic 
and A„ = o(n^/^), since in this case the bias of P is of order o(n~^/^). The 
reason for not using Q{R) directly even if R is acyclic is that by (2.3), there 
may exist S D R which are not acyclic, and we can therefore not conclude 
that the theorem also applies to Q unless we are in case (c). 

(4) Part (c) of the theorem shows that for graphs with A„ = i}{n), Q 
always gives -y/n-consistent estimates of any pattern while P is not consistent 
unless we assume acyclic graphs, since the bias is of order 0{\n/n) = 0(1). 
In the range A.„ = oin}/"^) to Q.{n), what is possible depends on the pattern. 
For instance, if A = {(1,2), (2,3), (3, 1)}, a triangle, P(A) = Q(A) (because 
there is no other graph on three nodes containing A), and P is -^/n-consistent 
if \^ > en^/^ by part (c) but otherwise only consistent if A„ — )■ oo. 
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2.2. Examples of specific patterns. Next we give explicit formulas for 
several specific R. Our main focus is on wheels (defined next), which, as we 
shall see, in principle can determine the canonical w. 

Definition 1 (Wheels) . A (/c, /)-wheel is a graph with kl + 1 vertices and 
kl edges isomorphic to the graph with edges {{1,2), ... ,{k,k + 1); (1, k + 2), 
{2k, 2k + 1);..., (1, {I -l)k + 2),..., {Ik, Ik + 1)}. 

In other words, a wheel consists of node 1 at the center and I "spokes" 
connected to the center, and each spoke is a chain of k edges. We consider 
only k>2. The number of isomorphic (A;, /)-wheels on vertices l,...,p is 
N{R) = {kl + l)l/n. 

If the graph i? is a {k, l)-whee\, the theoretical moments have a simple 
form and can be expressed in terms of the operator T as follows: 

(2.9) Q(i?)=E(r'=(l)(6))'. 

This follows from 

Q{R) = E(E([]{/i(Ci,0) : {i,j) G i?(^)}|6 

= E(T^(1)(6))', 

where the first equality holds by the definition of Q and the second by the 
structure of a {k, ^)-wheel. 

For a {k, /)-wheel R, from our general considerations, EP(i?) = P{R) = 
Q{R) + o(l) if A„ = o{n) and in view of (2.8), P{R) always consistently 
estimates Q{R). However, y^-consistency of P (converging to Q) holds in 
general only if A„ = o(n^/^). By part (c) Q is -y/n consistent for Q only 
if Xn is of order larger than n^"^/'-'^'^^^. In the range between 0{v}^'^) 
and 0(n^~^/('^'+-^)), we do not exhibit a -^/n-consistent estimate though we 
conjecture that by appropriate de-biasing of P such an estimate may be 
constructed. However, A^ = o(n^/^) seems a reasonable assumption for most 
graphs in practice, and then we can use the more easily computed P. 

Definition 2 (Generalized wheels). A (k, l)-wheel, where k = {ki, . . . ,kt), 
\ = {li, . . . ,lt) are vectors and the /cj's are distinct integers, is the union Ri U 
• • • U Rt, where Rj is a (fcj, /j)- wheel, j = 1,. . . ,t, and the wheels Ri, . . . ,Rt 
share a common hub but all their spokes are disjoint. 

A (k, l)-wheel has a total of p = Ijkj + 1 vertices and Ijkj edges. For 
example, a graph defined by £; = {(1, 2); (1, 3), (3, 4); (1, 5), (5, 6); (1, 7), (7, 8), 
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(8.9) } is a (k,l)-wheel with k = (1,2,3) and 1 = (1,2,1). The number of 
distinct isomorphic (k, l)-wheels on p vertices is N{R) = pKYlj ■ 

We can compute, defining A{R) = Y\{^ij '■ ihj) ^ R}^ 

Q{R)=F(^f][A{R,) = l]^ 

(2.10) =E|jJP(A(i?j) = l|Hub)| 

t 

3=1 

Thus (k, l)-wheels give us all cross moments of r™'(^), m > 1. Note that all 
(k, l)-wheels are acyclic. 

We are not aware of other patterns for which the moment formulas are 
as simple as those for wheels. For example, if i? is a triangle, then 

Q{R) = / / / h{u,v)h{v,w)h{w,u) dudv dw 
Jo Jo Jo 
■1 

/ h^'^\u,w)h{w,u) dudw, 
Jo 

where h^'^\u,w) = h{u,v)h{v,w) dv corresponds to T'^f = h^'^\u,v) x 
f{v)dv. 

In general, unions of (k, l)-wheels are also more complicated. If i?i,i?2 
are (ki,li), (k2, 12) -wheels which share a single node \V {Ri) r\V {R2) = {a}], 
we can compute P{Ri U R2) = EP(i?i|^a)P(i?2|Ca)- If a is the hub of both 
wheels, then evidently i?i U R2 is itself a generalized wheel, and (2.10) ap- 
plies. Otherwise, the formula, as for triangles, is more complex. However, 
such unions of (k, l)-wheels are acyclic. 

3. Moments and model identifiability. We establish two results in this 
section: identifiability of block models with known K using {P[R) : R a 
(/e, Z)-wheel, 1 < Z < 2K — 1,2 < < X}, and the general identifiability of 
the function w from {P(i?)} using all (k, l)-wheels R. 

3.1. The block model. Let w correspond to a J^-block model defined 
by parameters 9 = (tt, pn, S), where tTq is the probability of a node being 
assigned to block a as before, and 

Fab = ^{Aj = l\i £aj£b)= pnSab, l<a,b<K. 
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Recall that the function h in (1.2) is not unique, but a canonical h can 
be defined. For the block model, we use the canonical h given by Bickel 
and Chen (2009). Let Hab = SabT^aT^b- Let the labeling of the communities 
1,. . . ,K satisfy Hi < ■ ■ ■ < Hk, where Ha = Ylb^ab is proportional to the 
expected degree for a member of block a. The canonical function h then 
takes the value Fab on the (a, 6) block of the product partition where each 
axis is divided into intervals of lengths vri, . . . ,ttk- Let F = ||-Fab||- 

In view of (2.6), we will treat pn as known. Let {Wki : 1 < / < 2K — 1, 2 < 
k < K} be the specified set of {k, /)-wheels, and let 

Tki = P''''P{Wki) = P{Wki), m = P{Wki). 

Let / : — )• be the map carrying the parameters of the block 

model 6= {it,S) to t = ||rfc;||. here is the appropriate open subset of 
^K{K+3)/2-2 ^ Note that the number of free parameters in the block model 
is K - 1 for vr and K{K + l)/2 for F, but S only has K{K + l)/2 - 1 free 
parameters, to account for p. 

Theorem 2. Suppose 6 = {it,S) defines a block model with known K, 
and the vectors vr, Fvr, . . . , F^~^7r are linearly independent. Suppose e <\n = 
o(?i^/2)_ xhen: 

(a) {Tfc/:/ = 1,...,2K- l,A; = 2,...,i^} identify the K{K + 3)/2 - 2 pa- 
rameters of the block model other than p (i.e., the map f is one to one). 

(b) If f has a gradient which is of rank — 2 at the true {ttq,So), 
then /~^(P(f)) is a ^/n- consistent estimate of {ttqjSq), where f= \\fki\\ and 
P{f ) is the closest point in the range of f to f. 

Note that the linear independence condition rules out all matrices F that 
have 1 as an eigenvector. In particular, it rules out the case of Faa equal for 
all a. Fab equal for all a^b, which was studied in detail by Decelle et al. 
(2011). Using physics arguments, they showed that in that particular case, 
when A = 0(1), there are regions of the parameter space where neither the 
parameters nor the block assignments can be estimated by any method. 

Part (b) shows -y/n-consistency of nonlinear least squares estimation of 
(vr, S) using f to estimate f(6', S). The variance of f^i is proportional asymp- 
totically to that of E{n^^^-^g_5tt;(^j,^j)|^i}, where corresponds to the hub, 
which we expect increases exponentially m. p = kl + 1. If we knew these vari- 
ances, we could use weighted nonlinear least squares. In Section 5, we suggest 
a bootstrap method by which such variances can be estimated, but we do 
not pursue this further in this paper. 
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3.2. The nonparametric model. In the general case, we express every- 
thing in terms of the operator = T/pn induced by the canonical w. We 
require that: 

(A) the joint distribution of {r^(l)(^) : Z > 1} is determined by the cross 
moments of (r^^(^), . . . ,T^'''(^)), for li,. . .,lk arbitrary. 

A simple sufficient condition for (A) is \w\ < M < oo. A more elaborate 
one is the following: 

(A') 

]Ee*«''{6,6) < < |s| < e ah k some e > 0. 

Proposition 2. Condition (A') implies (A). 
The proof is given in the Appendix. 

Let w characterize T^, where Jq w'^{u,v) dudv < oo. By Mercer's theorem, 

(3.1) w{u,v)=^\j(l)j{u)(t)j{v), 

i 

where the are orthonormal eigenfunctions and the Aj eigenvalues, A| < oo. 

Theorem 3. Suppose Jq w^{u,v) dudv < oo. Assume the eigenvalues 
Ai > A2 > • • • ofTw are each of multiplicity 1 with corresponding eigenfunc- 
tion and Jq (pjiu) du^O for all j. The joint distribution 0/ (r^u(l)(^), . . . , 
T^{1){^), . . .) then determines, and is determined by, uj{-,-). 

Note again that interesting cases are ruled out by the condition that all 
eigenfunctions of T are not orthogonal to 1. The general analogue to the 
block model case is that P{Aij = l|^j) cannot be constant for all i and j. 
Constancy can be interpreted as saying that Aij and the latent variable 
associated with vertex i are independent. The proof of Theorem 3 is given in 
the Appendix. The almost immediate application to wheels is stated next. 

Theorem 4. Suppose assumption (A) and the conditions of Theorem 3 
hold. Let Tki = P{Sia) where Ski is a (k, \)-wheel. Then S = {rki/ all k, 1} de- 
termines T. If f\i\ = P{Sia), fki are y/n- consistent estimates o/rki, provided 
that A„ = o(?i^/2)_ 

Proof. Since = (T(l)(^), . . . ,T'(,^)) has a moment generating func- 
tion converging on < |s| <ei, the moments (including cross moments) de- 
termine the distribution of the vector. By (2.10), the Tki give all moments 
of the vector for all /. By Theorem 1, the fki are -^/n-consistent. □ 

4. Degree distributions. The average degree D is, as we have seen in 
Theorem 1, a natural data dependent normalizer for moment statistics which 
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eliminates the need to "know" pn - In fact, as we show in this section, the joint 
empirical distribution of degrees and what we shall call m degrees below can 
be used in estimating asymptotic approximations to w{-,-) in a somewhat 
more direct way than moment statistics. They can also be used to approx- 
imate moment estimates based on (k, l)-wheels in a way that potentially 
simplifies computation. 

We define the m-degree of i, Df^\ as the total number of loopless paths of 
length m between i and other vertices. Note that the D^^^ can be interpreted 
as the "volume" of the radius m geodesic sphere around i. As for regular 
degrees, we normalize and consider D^'^^ /D"^, i = 1, . . . ,n, and the empiri- 

cal joint distribution of vectors = (-^, • • • , ), i = 1, . . . ,n. The 
generalized degrees can be computed as follows: for all entries of A'^, elim- 
inate all terms in the sum defining each entry in which an index appears 
more than once to obtain a modified matrix A^"^^ = [Ji-™^]; then the D^"^^ are 

given by row sums of A^"^\ In other words, letting = 0(1 j)e-E(_R) 

we can write 



The complexity of this computation is 0((n + m)A™) (first term is for 
computing the row sums of A"* and the second for eliminating the loops). 
Define the empirical distribution of the vector of normalized degrees 



Further, recall the Mallows 2-distance between two distributions P and Q, 
defined by M2(P, Q) = minF{(E||X - y||2)V2 . (x, y) ~ X ~ P, y ~ Q}. 

A sequence of distribution functions Fn converges to F in M2 (Fn F) if 
and only if F„ =^ F in distribution, and Fn-, F have second moments such 
that / |x|2 (iF„(x) ^ /|x|2(iF(x). 

Theorem 5. Suppose A„ — ?• 00 and \ w2m\ < 00. Then Fm —t Fm as 
00, where Fm is the distribution of 6^(0 = i'^wiO^- ■■ j'^w~^i'^w){0) ' ^'^^ 
'''wiO — lo '^i^j''^) '^'^ monotone increasing. Moreover, if Grni^^y) is the 
empirical distribution of (D^-^\9m{£,i)), then 



Aj = ^{^E(R) ■ R = {(«,n), (^1,^2), • • • , (im-1, j)}> 



. . .,im-i,j distinct}. 



F™(x)=^j:i(Df)<x). 



i=l 



(4.1) 




The proof is given in the Appendix. 
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There is an attractive interpretation of the last statement of Theorem 
5. If A„, — oo, A„ = o(n^/("^~^)), m>2, then Di/\n can be identified with 
r(,^j) in the fohowing sense: While is unobserved but Di/D is, on average, 
r(^j) and Di/D are close. Since r is monotone increasing in ^, that is, is a 
measure of ^ on another scale, we can treat -Di/A„ as the latent affinity of i 
to form relationships. 

Bollobas, Janson and Riordan (2007) show that if m = 1, A„ = 0(1), then 
the limit of the empirical distribution of the degrees can be described as 
follows: given ^~W(0, 1), the limit distribution is Poisson with mean t^(^). 
The limit of the joint degree distribution in this case can be determined but 
does not seem to give much insight. 

Remark. Theorem 5 shows that the normalized degree distributions can 
be used for estimation of parameters only if An — )■ oo. If that is the case we 
can proceed as follows: 

(1) Let fi,...,f„ be the empirical quantiles of the normalized 1-degree 
distribution, and let T'^{fk) be the m-degree of the vertex with normalized 
degree ffc. 

(2) Fit smooth curves to {fk,T"^{fk)) viewed as observations of functions 
at f/c, k = 1, . . . ,n, for each m, and call these r"^(-) (on R). By Theorem 5, 
f""(t) ^ r'"-^(r)(r-i(t)) for ah t. If T""'^ {t'^ {■)) are smooth, the conver- 
gence can be made uniform on compacts. 

(3) From the fitted functions T"^{-), we can estimate the parameters of 
block models of any order consistently by replacing in the proof of iden- 
tifiability of block models by fitting the r"*(t) by T'"(t) of the type specified 
by block models and then using the corresponding v^. We only need the 
conditions of Theorem 5. 

5. Computation of moment estimates and estimation of their variances. 

General acyclic graph moment estimates including those corresponding to 
patterns arising from (k, l)-wheels are computationally difficult. For {k,l)- 
wheels with small k and /, we can use brute force counting, but unfortunately, 
the complexity of moment computation even for (/c, /)-wheels appears to be 
O(nAn). Note that we need to count the sets of loopless paths of length k, 
Sia, for each i, where Sia is the set of all paths of length k originating at node 
i which intersect another such path at oi < • • • < am, 1 < m < k, and 5,0 is 
the set of all paths of length k from i which do not intersect. The number of 
(fc,/)-wheels with hub i is then the number of /-tuples of such paths selected 
so that elements from Sia appear at most once, with the remaining paths 
coming from SiQ. This is computationally nontrivial. 

For very sparse graphs, however, intersecting paths can be ignored up 
to a certain order, and the wheel counts can be related to normalized m- 
degrees via a following approximation. If the conditions of Theorem 5 hold 
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and An = o(n") for all q > 0, then 



i=l 

A similar formula holds for fki- 

The heuristic argument for (5.1) is that the expected number of paths 
of lengths k from i is 0{X^). The expected number of pairs of such paths 
which intersect at least once is 

0(A^'^)P[two specified paths intersect at least once] 
= 0{Xl'{l - (1 - A„/n)^)) = O (^^) = 

if Xn = o{n") for all a > 0. Note that for A'-block models this condition 
is not necessary for all a, since we only need to count a finite number of 
(fe, /)-wheels. 

Estimation of variances of moment estimates even for (k, l)-wheels in- 
volve the counting of more complicated patterns. However, we propose the 
following bootstrap method: 

(i) Associate with each vertex i the counts of (k, l)-wheels for which it 
is a hub, Si = {njki: all k, 1}, i = 1, . . . , n. 

(ii) Sample without replacement m vertices {ii, . . . ,im}, and let 

^ m 

m ^-^ ■' 



For K a (k, l)-wheel, define 



(n/m) YJj=i "ijki 



' f)*\-\R\ 
P*{R) = P*{R)[ — 



m 

(iii) Repeat this B times to obtain P*,..., P^, and let 

b=l 

Then o"^ is an estimate of the variance of P{R) if ^ — )• 0, m — )• oo. 

This scheme works if A„, — ?• oo since, given that the first term of P{R) — 
P{R) is of lower order given .^i, . . . ,^ri, each P*{R) corresponds to a sam- 
ple without replacement from the set of possible We conjecture that 
this bootstrap still works if A^ = 0{1). A similar device can be applied to 
approximation (5.1). 



METHOD OF MOMENTS FOR NETWORKS 



15 



6. Discussion. 

6.1. Estimation of canonical w generally. Our Theorem 4 suggests that 
we might be able to construct consistent nonparametric estimates of wcan- 
That is, tm = {t^i '■ |k| < M, |1| < M} can be estimated at rate n"^!"^ for 
ah M < oo. But {tm,M > 1} determines T^, and thus in principle we can 
estimate arbitrarily closely using {rki}. This appears difficult both the- 
oretically and practically. Theoretically, one difficulty seems to be that we 
would need to analyze the expectation of moments or degree distributions 
when the block model does not hold, which is doable. What is worse is that 
the passage to w from moments is very ill-conditioned, involving first inver- 
sion via solution of the moment problem, and then estimation of eigenvectors 
and eigenvalues from a sequence of iterates T^{l),T^{l), etc. If we assume 
An — > oo so that we can use consistency of the degree distributions, we bypass 
the moment problem, but the eigenfunction estimation problem remains. A 
step in this direction is a result of Rohe, Chatterjee and Yu (2011) which 
shows that spectral clustering can be used to estimate the parameters of 
k block models if A — t- oo sufficiently, even if A: — )• oo slowly. Unfortunately 
this does not deal with the problem we have just discussed, how to pick a 
block model which is a good approximation to the nonparametric model. 
For reasons which will appear in a future paper, smoothness assumptions 
on w have to be treated with caution. 

While A„ — )• oo has not occurred in practice in the past, networks with 
high average degrees are now appearing routinely. In particular, university 
Facebook networks have A of 15 or more with n in the low thousands. In 
any case A^ — )• oo can still be useful as an asymptotic regime that can help 
us understand some general patterns, in the same way that the sample size 
going to infinity does in ordinary statistics. Note that most of the time we 
do not specify the rate of growth of A„, which can be very slow. 

6.2. Adding covariates and directed graphs. In principle, adding covari- 
ates Xi at each vertex or Xij at each edge simply converts our latent variable 
model, w{-,-) into a mixed model 

Fg{Aij = l\Xi,Xj,Xij,S^i,S^j) = we{S,i,(,j,Xi,Xj,Xij), 

which can be turned into a logistic mixed model. Special cases of such models 
have been considered in the literature; see Hoff (2007) and references therein. 
We do not pursue this here. The extension of this model to directed graphs 
is also straightforward. 

6.3. Dynamic models. Many models in the literature have been spec- 
ified dynamically; see Newman (2010). For instance, the "preferential at- 
tachment" model constructs an n graph by adding 1 vertex at a time, with 
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edges of that vertex to previous vertices formed with probabihties which are 
functions of the degree of the candidate "old" vertex. If we let n — )• oo, we 
obtain models of the type we have considered whose w function can be based 
on an integral equation for r(^), our proxy for the degree of the vertex with 
latent variable ^. We shall pursue this elsewhere also. 

APPENDIX: ADDITIONAL LEMMAS AND PROOFS 

Proof of Proposition 1. The first line of (2.2) is immediate, condi- 
tioning on {^1, . . . The second line in (2.2) follows by expanding the sec- 
ond product. Finally, (2.2) follows directly from the definitions of P and Q. 
□ 

The following standard result is used in the proof of Theorem 1. 
Lemma 1. Suppose {Un,Vn) are random elements such that, 

C{Un)^C{U), 
C{Vn\Un) ^ C{V) 

in probability. Then Un, Vn are asymptotically independent, 

C{Vn)^C{V). 

Proof of Theorem 1. By definition, IE(^^) = \ . Moreover, 
Var (^-^ Y^^Aif. ah 1 < i < j < n}^ = {uXnY^W. (^Var Aij\i^ ^ 



where 



= Var(ri)+Var(r2)^ 

Ti = (nAn)"^ "YiAij - Pnw{ii,ij)), 

i<j 

Since A.„ = (n — l)pn-, the first term is 

{n\nr^¥.Y,{h{i^,iJ){'^ - H^^,^j)) ah i,j} 

<^ = 0((nVr') = 0((nA„)-i). 
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The second term is a [/-statistic of order 2, which is weh known to be 
O(n-i). Thus, (2.5) fohows in case (a). 

To estabhsh (2.6) and (b), we note that the conditional distribution of 
\/nXnTi given ^ is that of a sum of independent random variables with 
conditional variance 

-^^PnW{S,i,^j){l- PnWn{^i,^j)) = ^ ^ w(Ci , ) (1 + Op (1)) 4 ^. 

i<j i<j 

This sum is approximated by a [/-statistic of order 2. Note that Eti;(^j, ^j) = 
1. Since the max of the summands in -y/nA^Ti is > 0, by the Lindeberg- 

Feller theorem, the conditional distribution tends to /V(0, ^) in probability. 
We can similarly apply the limit theorem for [/-statistics [see Serfling (1980)] 
to conclude that 

V^r2^Ar(0,Var(r(e))). 

Applying Lemma 1, we see that if A.„ = 0(1), (b) follows. On the other hand, 
if A„ — )■ oo, \/nTi is negligible, and the Gaussian limit is determined by T2. 

The proof of (2.7) and (2.8) is similar. We shall decompose P{R) as Ui + 
U2 as we did If A„ — )■ 00, it is enough to prove that 

^/^{P{R) - P{R)) ^Ar{0,a^{R)) 

since replacing D by np„ = A„ gives a perturbation of order (nA„)~^/^ = 
o(n-i/2). 

In case (b), it is enough to show that the joint distribution of ^/n{{P[R) — 
P{R))pn ,7i,T2) is Gaussian in the limit, since in view of (2.5) and (2.6) 
we can apply the delta method to P{R)- Let p = \ V{R)\, q=\R\. Each term 
in P{R) is of the form 

Condition on ^ = {^1, . . . ,£,n}- Then terms T{S), as above, yield 
(A.l) E{PiRM) = —^S2( n MC^,m)+0{n-'Xn). 
Thus, 

[/2 = E(P(ii)|0p-''-m, 
We begin by considering Var([/i|^) which we can write as 

^cov(r(5i),r(52)|0p;'^ 
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where the sum ranges over ah 5*1 ~ i?, 5*2 ~ R- 

If E{Si) n E{S2) = 4> the covariance is 0. In general, suppose the graph 
n 5*2 has c vertices and d edges. Since R is acyclic any subgraph is acyclic. 

By Corollary 3.2 of Chartrand, Lesniak and Behzad (1986) for every acyclic 

graph, \V{S)\ > 1^(5)1 + 1. Now, 

(A.2) p~^''co^{T{Si),T{S2M)<n-'Pp-^ J] ^-(^-0) 

(i,j)6SiU52 

since, if d > 1, 

E\Y{{Aij : (ij) e s^nS^}ll{Afj : (i, j) G 5i n ^2}!^" 

(A.3) 

There are 0{v?^~'^) terms in (A.l) which have c vertices in common. There- 
fore by (A.2) the total contribution of all such terms to Var(C/i) is 

oi^^^Pn"^ j w^'i{u,v)dudv^, 

after using Holder's inequality on IEn{'"^('^«' 0) • (^'.?) ^ 5*1 U 52}. From (A.3) 
and our assumptions we conclude that 

Var(C/i) = 0{n-^X-'^) = 0(71"^), 

if A„ — )• 00. On the other hand 




is a [/-statistic. Its kernel 

S s s s 

Thus, \/n{Ui,U2) are jointly asymptotically Gaussian; see, for instance, Ser- 
fling (1980). 

Since if — )• 00, Ti, C/i = op(n~^/^), the result follows if A„ — 00 . If A„ = 
0(1), we note that ^yn(Tl, Ui) are sums of q dependent random variables in 
the sense of Bulinski [see Doukhan (1994)] and hence, given ^, are jointly 
asymptotically Gaussian. It is not hard to see that the limiting conditional 
covariance matrix is independent of ^, as it was for Ti marginally. By Lemma 
1 again (Ti, Ui) and (T2, C/2) are asymptotically independent and (a) and (b) 
follow. 

Finally we prove (c). To have consistency for P{R), P{R) and 

hence for Q{R), Q{R) by (2.3) we need to argue that if S* C .R, c=\S\<p 
\E{S)\ = d, then for a universal M, 

n-'^p-'^ <Mn-^. 
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Since p = ^ we obtain 

For fixed c > 1 this is maximized by d = '^^'^2^^ n^~'^^^ is maximized for 
c <p hy c = p. □ 

Proof of Theorem 2. Since T corresponds to the canonical h, 

i-1 j 
r(l)(e)=%), 5^vrfc<e<^7rfc, l<j<K, 
k=l k=l 

where 'y(i) < • • • < are the ordered {vj}, Vj = "^^i T^iFij. By a theorem 
of Hausdorff and Hamburger [Feller (1971)], the distribution of the random 
variable T(l)(^i) which takes on only K distinct values above is completely 
determined and uniquely so by its first 2K — 1 moments E(T(l)(^i))', / = 
1, . . . , 2K — 1. Therefore for our model tti, . . . , ttk are completely determined 
since T(l)(^i) takes values vj with probability vTj, j = 1, . . . , K. 

Let = {v(^i),...,V(^K)f = Ftt. Note that E{T^{l){Ci)y ,1 = 1, . . . ,2K - 
1, similarly determines the distribution of T^(l)(^i). Hence, 

Continuing we see that the (K — 1){2K — 1) moments {rki - 2 <k < K,l < 
1<2K-1} yield 

(A.4) ^(i) =F7;(J'-1) 

for j = 1, . . . , K where = vr. 

Given tt,v^^\ . . . ,v^^^ linearly independent, we can compute F since by 
(A.4), we can write 

Z7 T/(l) _ T/(2) 

where V^^^ = {v'-^l . . .,v^^-^^Y and 1/(2) = (^(i)^ _ _ . ,?;W)^ and hence 

F = F(2)[y(i)]-i. 

Consistency and -y/n-consistency follow from Theorem 1 and the delta method. 
□ 

Proof of Proposition 2. Note that 

E exp sT' ( 1 ) (0 = E exp sE( w; (C, ei ) • • • «^ (6- 1 , 6 ) I e) 

(A.5) 

< Eexps(t«(^, ^i) • • • u;(^z_i, CO)- 
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Taking ^ = Co, 

(A.6) (A.5)<Eexp|s||^j^«;'(e„ej+i)j 

by the arithmetic/geometric mean and Minkowski inequahties. By Holder's 
inequahty (A.6) is bounded by 

I 

ll[Ee^p\s\w\C„^j+i)]'/'. 

j=0 

It is easy to show that (A') impHes that Eexpj^Jl^ sjT^ {!){£,)} converges 
for < \s\ < £ for some £ depending on m and hence by a classical result 
that (A') implies (A). □ 

Proof of Theorem 3. Clearly w determines the joint distribution 
of moments. We can take t^(C) = Tw{l){^) monotone, corresponding to 
the canonical w, to be the quantile function of the marginal distribution 
of r^(l)(0- Now the joint distribution of {T^{1){£,),T^{1){0) determines 
Tw{-), TyjTyj{-), except on a set of measure 0. Continuing this argument, we 

can determine the entire sequence of functions r^, T^t^, T^t^, Since 

(1) 9'- 

is bounded self-adjoint, these functions are all in L2. Let g, (•) = t.^ ), 

l9fe-ll 

g^\-) = 1, where |/| and {f,g) are, respectively, the norm and the inner 
product in L2. Then g^ -^L2 -^i^i where Ai is the first eigenvalue, (pi the 
first eigenfunction and — )■ 0i. This is just the "powering up" method 
applied to the function 1 with convergence guaranteed since Ai is unique, 

and 1 is not orthogonal to (pi or any other eigenfunction. So Ai and (pi are 

(2) 

also determined. Thus we can compute g^ = I — {l,(pi)(pi. Further, 

^(2)_^ fg^ \ _TM-^i{h^i)^i 



m 



is computable since we know r^l(-) and the eigenfunction (pi and eigenvalue 

Ai. More generally, T!^g'^\ l^^,^;^! can be similarly determined. Then, by the 

same argument as before, using 1 not orthogonal to (p2, we obtain g^^^ -^12 

X2(p2 and ffi^Vl^i^^l -^L2 (p2- Now form g^^^ = 1 - Ai(l,0i)</>i - X2ii,(p2)<p2 
and proceed as before, and continue to determine Xki(pk for all k. This and 
(3.1) complete the proof. □ 

Proof of Theorem 5. Note first that (4.1) implies that the M2 dis- 
tance between and the empirical distribution of {Om{ii)} tends to 0. 
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The first conclusion of the theorem now follows by the Glivenko-Cantelli 
theorem and the Law of Large Numbers. 
To show (4.1), note that 



(A.7) 



where D^[^^ = . . . , y . By Theorem 1, we can replace D by A„ if 
An > £■ Then (A.7) is implied by 



1 

n ^ ' 



(m) 



.(e.)i'^o, 



(A.8) 
Now, 



1 " 
-Ve 

n ^ 



i=l 



(m) 



n 



0. 



(A.9) 



^ • ^ = {(^'^i)' ■ • ■ ' (wi,i)}, 



all vertices distinct}, 
where we{r) = Y\[ab)<^E{R)'^i^a-'^b)- Further, (A.9) is a [/-statistic of order 
m under \w2m\ < co and 



E 



EM 



/A, 



(m) 



■^{wE{R)\ii) 



< 



C\w2r, 



n 



by standard theory [Serfling (1980)]. 

Since '^{wE{B)\ii) = ^m(Ci); we can consider 



(A.IO) 



i=l 



E 



A"^ 



< max 



iEiE;=i(^sr-^(-4rio)i^ 

A2™ 



Note that R = (11,12), ■ ■ ■ , iim-i,j)} is acyclic if all vertices are dis- 

tinct. As in the proof of Theorem 1, all nonzero covariance terms in (A.IO) 
are of order p2m-dj^2m-c ^j^g^g c>d since the intersection graphs all have 
i in common but are otherwise acyclic. The largest order term corresponds 
to c = d = m, so that 

E 



^(A--i;f -0„(e.)) 



<CA--, 
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where C depends on \w2m\ only. Thus (A. 8) holds if A„ — oo. □ 
Acknowledgment. Thanks to Allan Sly for a helpful discussion. 

REFERENCES 

Aldous, D. J. (1981). Representations for partially exchangeable arrays of random vari- 
ables. J. Multivariate Anal 11 581-598. MR0637937 

Barabasi, A.-L. and Albert, R. (1999). Emergence of scaling in random networks. 
Science 286 509-512. MR2091634 

BiCKEL, P. J. and Chen, A. (2009). A nonparametric view of network models and 
Newman-Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068-21073. 

BOLLOBAS, B., Janson, S. and Riordan, O. (2007). The phase transition in inhomoge- 
neous random graphs. Random Structures Algorithms 31 3-122. MR2337396 

Chartrand, G., Lesniak, L. and Behzad, M. (1986). Graphs and Digraphs, 2nd ed. 
Wadsworth and Brooks, Monterey, CA. 

Chatterjee, S. and Diaconis, P. (2011). Estimating and understanding exponential 
random graph models. Unpublished manuscript. Available at arXiv:1102.2650. 

Chung, F. and Lu, L. (2002). Connected components in random graphs with given ex- 
pected degree sequences. Ann. Comb. 6 125-145. MR1955514 

DE SOLLA Price, D. J. (1965). Networks of scientific papers. Science 149 510-515. 

Decelle, A., Krzakala, F., Moore, C. and Zdeborova, L. (2011). Asymptotic analy- 
sis of the stochastic block model for modular networks and its algorithmic applications. 
Available at arXiv:1109.3041. 

Diaconis, P. and Janson, S. (2008). Graph limits and exchangeable random graphs. 
Rend. Mat. Appl. (7) 28 33-61. MR2463439 

DOUKHAN, P. (1994). Mixing: Properties and Examples. Lecture Notes m Statistics 85. 
Springer, New York. MR1312160 

Feller, W. (1971). An Introduction to Probability Theory and Rs Applications. Vol. II, 
2nd ed. Wiley, New York. MR0270403 

Frank, O. and Strauss, D. (1986). Markov graphs. J. Amer. Statist. Assoc. 81 832-842. 
MR0860518 

Handcock, M. (2003). Assessing degeneracy in statistical models of social networks. 
Working Paper 39, Center for Statistics and the Social Sciences. 

Handcock, M. S., Raftery, A. E. and Tantrum, J. M. (2007). Model-based clustering 
for social networks. J. Roy. Statist. Soc. Ser. A 170 301-354. MR2364300 

HOFF, P. D. (2007). Modeling homophily and stochastic equivalence in symmetric re- 
lational data. In Advances m Neural Information Processing Systems 19. MIT Press, 
Cambridge, MA. 

HoFF, P. D., Raftery, A. E. and Handcock, M. S. (2002). Latent space approaches 
to social network analysis. J. Amer. Statist. Assoc. 97 1090-1098. MR1951262 

Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels; 
First steps. Social Networks 5 109-137. MR0718088 

Holland, P. W. and Leinhardt, S. (1981). An exponential family of probability distri- 
butions for directed graphs. J. Amer. Statist. Assoc. 76 33-65. MR0608176 

Hoover, D. (1979). Relations on probability spaces and arrays of random variables. 
Technical report, Institute for Advanced Study, Princeton, NJ. 

Kallenberg, O. (2005). Probabilistic Symmetries and Invariance Principles. Springer, 
New York. MR2161313 



METHOD OF MOMENTS FOR NETWORKS 



23 



Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community 
structure in networks. Phys. Rev. E (3) 83 016107. MR2788206 

Newman, M. E. J. (2006). Finding community structure in networks using the eigenvec- 
tors of matrices. Phys. Rev. E (3) 74 036104. MR2282139 

Newman, M. E. J. (2010). Networks: An Introduction. Oxford Univ. Press, Oxford. 
MR2676073 

NowiCKl, K. and Snijders, T. A. B. (2001). Estimation and prediction for stochastic 

blockstructures. J. Amer. Statist. Assoc. 96 1077-1087. MR1947255 
PiCARD, F., Daudin, J. ,]., KoSKAS, M., SCHBATH, S. and RoBiN, S. (2008). Assessing 

the exceptionality of network motifs. J. Comput. Biol. 15 1-20. MR2383618 
Robins, G., Snijders, T., Wang, P., Handcock, M. and Pattison, P. (2007). Recent 

developments in exponential random graphs models (p*) for social networks. Social 

Networks 29 192-215. 

ROHE, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high- 
dimensional stochastic block model. Ann. Statist. To appear. 

Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics. Wiley, New 
York. MR0595165 

ShAlizi, C. R. and Rinaldo, A. (2011). Projective structure and parametric inference 
in exponential families. Carnegie Mellon Univ. Unpublished manuscript. 

Snijders, T. A. B. and Nowicki, K. (1997). Estimation and prediction for stochas- 
tic blockmodels for graphs with latent block structure. J. Classification 14 75-100. 
MR1449742 



P. ,J. BiCKEL 

Department of Statistics 
University of California 
367 Evans Hall 

Berkeley, California 94720-3860 
USA 

E-MAIL: bickel@stat.berkeley.edu 

E. Levina 
Department of Statistics 
University of Michigan 
439 West Hall 
1085 S. University Ave. 
Ann Arbor, Michigan 48109-1107 
USA 

E-MAIL: elevina@umich.edu 



A. Chen 
Google Inc. 

1600 Amphitheatre Pkwy 
Mountain View, California 94043 
USA 

E-MAIL: aiyouchcn@googlc.com 



